Power and Sample Sizes
The Power of a Test
Suppose we have a method to test a null hypothesis against an alternative hypothesis. The test would be \"controlled\" at some level , i.e. whenever is true
On the other hand, when is false one wants to be as high as possible.
If the parameter to be tested is and is a value within and is in , then we want and as large as possible.
For a general we write:
for the power of the test.
Details
Do not use the phrase "accept".
The Power of Tests for Proportions
Examples
Suppose 7 students are involved in an experiment which is comprised of 7 trails and each trial consists of rolling a die 9 times.
Experiment 1: A student records a if they toss an even number ( ) (i.e. failure), and records a if they toss an odd number ( ) (i.e. success). After tossing the die 9 times and recording a or the student tabulates the number of 's. This process is repeated six more times
Data and outcomes:
equals the number of successes in trials equal
Thus, equals number of odd numbers.
Question: Test whether that is vs.
Solution: Now, is an outcome of . We know from the CLT that
write so if is true then
so we reject
if the observed value
is such that
Outcomes from 21 trials:
So we do not reject the null hypothesis!
We can rewrite the test statistics slightly
Note that we reject if > i.e. if > > > (for > or > ) or >, > (for or > ).
Suppose 7 students are involved in an experiment which is comprised of 7 trails and each trial consists of rolling a die 9 times
Experiment 2: The procedure is the same as in experiment , but now the student records for a or (failure) and a for a , , , or (success).
Data and outcomes:
equals the number of successes in trials equal Thus, equaks number of 1's.
Solution: Outcomes from 21 experiments:
This time our test is vs . Note that we reject if (for ) or if (for )
We reject in 3 out of 21 trials.
Suppose 7 students are involved in an experiment which is comprised of 7 trails and each trial consists of rolling a die 9 times.
Experiment 3: Same as experiment except is recorded for (failure) and a is recorded for (success)
Data and outcomes:
equals number of successes in trials equal Thus, equals number of 's.
Solution: Outcomes from 21 experiments
With the same kind of calculations as above, we find that we reject the null hypothesis in 14 out of 21 trials.
The Power of the One Sided -test for the mean
The one sided -test for the mean is based on a random sample where are independent and is known. The power of the test for an arbitrary can be computed as:
Details
The one sided -test for the mean is based on a random sample where are independent and is known. If the hypotheses are vs , then we know that, if is true.
Given data , the -value is
We reject if
The level of this test is:
since when is the true value.
The power of the test for an arbitrary can be computed as follows.
We obtain:
Examples
Suppose we know and we will take a sample from intending to test the hypothesis at level . We want to know the power against a one-tailed alternative when the true mean is actually when the sample size is .
We can set this up in R with:
alpha <- 0.05
n <- 25
sigma <- 2
mu0 <-3
mu <-4
zcrit <- qnorm(1-alpha)
Sticking the formula into R gives
> 1-pnorm((mu0-mu)/(sigma/sqrt(n))+zcrit)
[1] 0.803765
On the other hand, one can also use a simple simulation approach.
First, decide how many samples are to be simulated ( Nsim
).
Then, generate all of these samples, arrange them in a matrix and compute the mean of each sample.
The -value of each of these Nsim
tests are then computed and a check is made whether it exceeds the critical point (1) or not (0).
Nsim <- 10000
m <- matrix(rnorm(Nsim*n,mu,sigma),ncol=n)
mn <- apply(m,1,mean)
z <- (mn-mu0)/(sigma/sqrt(n))
i<-ifelse(z>zcrit,1,0)
Compute the sum:
> sum(i/Nsim)
[1] 0.8081
Power and Sample Size for the One-sided -test for a single normal mean
Suppose we want to test vs . We will reject if the observed value
is such that .
Details
Suppose we want to test vs . So based on independent and identically distributed with known we will reject if the observed value
is such that .
The power is given by:
and describes the probability of rejecting when is the correct value of the parameter. Suppose we want to reject with a prespecified probability , when is the true value of . For this, we need to select the sample size so that
i.e. find which satisfies
Examples
mu0 <-10
sigma <- 2
mu1 <- 11
n <- 50
d <- (mu1-mu0)
power.t.test(n=n,delta=d,sd=sigma,sig.level=0.05,type="one.sample",alternative="one.sided",strict=TRUE)
One-sample t test power calculation
n = 50
delta = 1
sd = 2
sig.level = 0.05
power = 0.9672067
alternative = one.sided
The Non Central T - Distribution
Recall that if and are independent then
and it follows for a random sample independent; that
Details
On the other hand, if and are independent, then has a non central distribution with degrees of freedom and non centrality parameter . This distribution arises, if independent and we want to consider the distribution of:
Where which is a non central t with non centrality parameters:
with df. Here df since and in this equation.
The Power of T-test for a Normal Mean (warning: Errors)
Details
WARNING: This is all wrong and needs to be rewritten.
Consider independent and identically distributed where is unknown and we want to test vs. . We know that
and we will reject if the computed value
is such that
The power of this test is:
Which is the probability that a -variable exceed .
WARNING: This is all wrong and needs to be rewritten (the s in the above line is a random variable so this make no sense at all).
Power and Sample Size for the One Sided T-test for a Mean
Suppose we want to calculate the power of a one sided -test for a single mean (one sample), this can easily be done in R with the power.t.test
command.
Details
\
Examples
For a one sided power analysis we wish to test the following hypotheses:
For a one sample test: vs.
For a two sample test: vs.
In R, the power.t.test
command is useful to calculate how many samples one needs to obtain a certain power of a test, but also to calculate the power when we have a given number of samples.
How many samples do I need to get a power of ?
> power.t.test(power=.95, delta=1.5,sd=2, type="one.sample", alternative="one.sided")
One-sample t test power calculation
n = 20.67702
delta = 1.5
sd = 2
sig.level = 0.05
power = 0.95
alternative = one.sided
We would thus need a sample size of $n = 31.15 \approx 32$ samples to obtain a power of $0.9$ for our analysis.
With a sample size of , what will the power of my test be?
> power.t.test(n=45,delta=1.5,sd=2,sig.level=0.05,type="one.sample",alternative="one.sided")
One-sample t test power calculation
n = 45
delta = 1.5
sd = 2
sig.level = 0.05
power = 0.9995287
alternative = one.side
This is done the same way for two samples only by changing the alternative to two.sample
.
For two sided power analysis, one only needs to change the alternative to two.sided
.
If one is interested in doing a power analysis for an ANOVA test, this is done in a fairly similar way. With a given sample size of :
> power.anova.test(groups=4, n=20, between.var=1, within.var=3)
Balanced one-way analysis of variance power calculation
groups = 4
n = 20
between.var = 1
within.var = 3
sig.level = 0.05
power = 0.9679022
To calculate the sample size needed to obtain a power of 0.90
for a test:
> power.anova.test(groups=4, between.var=1, within.var=3, power=.9)
Balanced one-way analysis of variance power calculation
groups = 4
n = 15.18834
between.var = 1
within.var = 3
sig.level = 0.05
power = 0.9
NOTE: n is number in each group
The Power of the 2-sided T-test
A power analysis on a two-sided -test can be done in R using the power.t.test
command.
Details
For a one sample test, vs. , the power.t.test
command is useful to provide information for determining the minimum sample size one needs to obtain a certain power of a test:
power.t.test(n=,delta=,sd=,sig.level=,power=,type=c("two.sample","one.sample","paired"),alternative=c("two.sided"))
where:
n
is the sample sizedr
is the effect sizes
is the standard deviationsig.level
is the significance levelpower
is normally0.8
,0.9
or0.95
type
istwo sample
,one sample
orpaired
(the type selected depends on the research)alternative
is eitherone sided
ortwo sided
Examples
How many samples do I need in my research to obtain a power of 0.8
?
> power.t.test(delta=1.5,sd=2,sig.level=0.05,power=0.8,type=c("two.sample"),alternative=c("two.sided"))
Two-sample t test power calculation
n = 28.89962
delta = 1.5
sd = 2
sig.level = 0.05
power = 0.8
alternative = two.sided
NOTE: n is number in *each* group
So, one needs 29
samples (n=29
) to obtain a power level of 0.8
for this analysis.
The Power of the 2-sample One and Two-sided T-tests
The power of a two sample, one-sided t
-test can be computed as follows:
and the power of a two sample, two-sided t
-test is given by:
where and U
is the SSE.
Details
t
-testSuppose data are gathered independently from two normal populations resulting in
where all data are independent then
The null hypothesis in question is versus the alternative . If is true then the observed value
comes from a distribution with degrees of freedom and we reject if \ The power of the test can be computed as follows:
where
and is the SSE of the samples which is divided by the appropriate degrees of freedom to give a distribution.
This is the probability that a non-central t
-variable exceeds .
t
-testIn this case the null hypothesis is defined as versus alternative .
The power of the test can be computed as follows:
where and U
is the SSE of the samples which is divided by the appropriate degrees of freedom to give a distribution.
Note that the power of a test can be obtained using the power.t.test
function in R.
Sample Sizes for Two-sample One and Two-sided -tests
The sample size should always satisfy the desired power.
Details
Suppose we want to reject the with a pre-specified probability when and are true values of .
For this, we need to select the sample size n
and m
so that i.e. find n
and m
which satisfies
for a two sample, one-sided t
-test.
Similarly for a two sample, two-sided t
-test we need to find n
and m
that satisfies
A Case Study In Power
We want to compute power in analysis of covariance:
where are independent and identically distributed. This can be done by simulation and can easily be expanded to other cases.
Handout
If you want to compute a power analysis in analysis of covariance
where are independent identically distributed, then use simulation
To do this one needs to first define the task in more detail, along with what exactly is known and what the assumptions are.
Note that there are only two groups, with intercepts and .
The "power" will refer to the power of a test for , i.e. we want to test whether the group means are equal, correcting for the effect of the continuous variable x
.
In principle, the x
-values will be either fixed a priori or they may be a random part of the experiment.
Here we will assume that the x
-values are randomly selected in the range 20-30 (could e.g. be the ages of patients).
Since this is in the planning stage of the experiment, we also have a choice of the sample size within each group.
For convenience, the sample sizes are taken to be the same in each group, J
so the total number of measurements will be n=2J
.
We also need to decide at which levels of and the power is to be computed (but it is really only a function of the difference, ).
The following pieces of R code can be saved into a file, ancovapow.r
and then use the command:
source("ancovapow.r")
can be used to run the whole thing
The beginning of the command sequence merely consists of comments and definitions of parameter values. These need to be changed for each case separately.
#
# ancovapow.r - power computations for analysis of covarariance
# - one factor, two levels mu0, mu1
# - one covariate x, x0 stores possible values from which a random set is chosen
#
# first set values of parameters
#
alpha <- 0.05
sigma <- 7.5 # the common standard deviation
x0 <- 20:30 # the set of x values
delta <- 10 # the difference in the means
mu0 <- 0 # the first mean
mu1 <- mu0+delta # the second mean
slope <- 2.5 # the slope in the ancova
J <- 10 # the common sample size per factor level
n <- 2*J # the total sample size
Nsim <- 40000 # the number of simulations for power computations
Rather than head straight for the ANCOVA, start with a simpler case, namely ignoring the covariate (x
) and merely doing a regular two-sample, two-tailed t
-test.
This should be reasonably similar to the ANCOVA power computations anyway.
#
# Next do the power computations just for a regular two-sided, two-sample t-test
# and use simulation
#
Y1 <- matrix(rnorm(J*Nsim,mu0,sigma),ncol=J) # Simulate Nsim samples of size J, ea N(mu1,sigma^2)
Y2 <- matrix(rnorm(J*Nsim,mu1,sigma),ncol=J) # Simulate Nsim samples of size J, ea N(mu2,sigma^2)
y1mn <- apply(Y1,1,mean) # compute all the simulated y1-means
y2mn <- apply(Y2,1,mean) # compute all the simulated y2-means
sy1 <- apply(Y1,1,sd) # compute all the simulated y1-std.devs
sy2 <- apply(Y2,1,sd) # compute all the simulated y2-std.devs
s <- sqrt(((J-1)*sy1^2+(J-1)*sy2^2)/(n-2)) # compute all the pooled std.devs
t <- (y1mn-y2mn)/(s*sqrt(1/J+1/J)) # compute all the Nsim t-statistics
i <- ifelse(abs(t)>qt(1-alpha/2,n-2),1,0) # for ea t, compute 1=reject, 0=do not reject
powsim2 <- sum(i)/Nsim # the simulated power
cat("The simulated power is ",powsim2,"\n")
The above gave the simulated power. In R there is a function to do the same computations and it is worth while to verify the code (and approach) by checking whether these give the same thing:
#
# Then compute the exact power for the t-test
#
pow2 <- power.t.test(delta=delta,sd=sigma,sig.level=alpha,n=J ,type=c("two.sample"),alternative=c("two.sided"))
cat("The exact power:\n")
print(pow2)
Finally, start setting up the code to do the ANCOVA simulations. Note that for this we need to generate the -values. In this example it is assumed that the -values are not under the control of the experimenter but arrive randomly, in the range from 20 to 30 (could e.g. be the age of participants in an experiment).
#
# Finally compute the power in the ANCOVA - note we already have simulated Y1, Y2-values but have not added the x-part yet
#
x1 <- matrix(sample(x0,Nsim*J,replace=T),ncol=J) # simulate x-values for y1
x2 <- matrix(sample(x0,Nsim*J,replace=T),ncol=J) # simulate x-values for y2
Y1 <- Y1+slope*x1
Y2 <- Y2+slope*x2
fulldat <- cbind(Y1,Y2,x1,x2) # a row now contains all y1, then all y2, then all x1, then all x2; Nsim rows
Rather than try to write code to do an ANCOVA, it is natural to use the R function lm to do this. The "trick" below is to extract the -value from the summary command. By defining a function which takes a single line as an argument, it will subsequently be possible to use the function to extract the -values using a one-line R command.
ancova.pval<-function(onerow){ # extract the ancova p-value for diff in means
J <- length(onerow)/4
n <- 2*J
y <- onerow[1:n] # get the y-data from the row
x <- onerow[(n+1):(2*n)] # get the x-data from the row
grps <- factor(c(rep(1,J),rep(2,J))) # define the groups
sm <- summary(lm(y~x+grps)) # fit the ancova model
pval <- sm$coefficients[3,4] # extract exactly the right thing from the summary command-the p-value for H0:mu1=mu2
return(pval)
}
Everything has now been defined so it is possible to compute all the -values in a single command line:
pvec <- apply(fulldat,1,ancova.pval)
i2 <- ifelse(pvec<alpha,1,0) # for ea test, compute 1=reject, 0=do not reject
ancovapow <- sum(i2)/Nsim # the simulated power
cat("The simulated ancova power is ",ancovapow,"\n")
When run, this script returns:
The simulated power is 0.803025
The exact power:
Two-sample t test power calculation
n = 10
delta = 10
sd = 7.5
sig.level = 0.05
power = 0.8049123
alternative = two.sided
NOTE: n is number in each group
The simulated ANCOVA power is 0.775175
.
It is seen that when the x
-values are not included in any way (in particular, ), the power is 180.5%
.
However, this is not the correct model in the present situation.
Using the above value of .
Taking this into account, the power is actually a bit lower or 77.5%
.